330 PART 6 Analyzing Survival Data

remarkably predictable shapes or distributions (the most common being the

Weibull distribution, covered in Chapter 24). Because of this, these disciplines

often use a parametric form of survival regression, which assumes that you

can represent the survival curves by algebraic formulas. Unfortunately for

biostatisticians, biological data tends to produce nonparametric survival curves

whose distributions can’t be represented by these parametric distributions.

As described earlier, nonparametric survival analyses using life tables, Kaplan-

Meier plots, and log-rank tests are limiting. But as biostatisticians, we could not

rely on using parametric distributions in our models; we wanted to use a hybrid,

semi-parametric kind of survival regression. We wanted one that was partly non-

parametric, meaning it didn’t assume any mathematical formula for the shape of

the overall survival curve, and partly parametric, meaning we could use some

parameter (or predicted survival distribution shape) to guide our formulas the

way other industries used the Weibull distribution. In 1972, a statistician named

David Cox developed a workable method for doing this. The procedure is now

called Cox proportional hazards regression, which we call PH regression for the rest of

this chapter for brevity. In the following sections, we outline the steps of per-

forming a PH regression.

Since 1972, many issues have been identified when using survival regression for

biological data, especially with respect to its appropriateness for the type of data.

One way to examine this is by running a logistic regression model (see Chapter 18)

with the same predictors and outcome as your survival regression model without

including the time variable, and seeing if the interpretation changes.

The steps to perform a PH regression

You can understand PH regression in terms of several conceptual steps, although

when using statistical software like is described in Chapter 4, it may appear that

these steps take place simultaneously. That is because the output created is

designed for you — the biostatistician — to walk through the following steps in

your mind and make decisions. You must use the output to:

1.

Determine the shape of the overall survival curve produced from the

Kaplan-Meier method.

2.

Estimate how your hypothesized predictor variables may impact the

bends in this curve — in other words, in what ways your predictors may

affect survival.